Diversified Random Forests Using Random Subspaces

نویسندگان

  • Khaled Fawagreh
  • Mohamed Medhat Gaber
  • Eyad Elyan
چکیده

Random Forest is an ensemble learning method used for classification and regression. In such an ensemble, multiple classifiers are used where each classifier casts one vote for its predicted class label. Majority voting is then used to determine the class label for unlabelled instances. Since it has been proven empirically that ensembles tend to yield better results when there is a significant diversity among the constituent models, many extensions were developed during the past decade that aim at inducing some diversity in the constituent models in order to improve the performance of Random Forests in terms of both speed and accuracy. In this paper, we propose a method to promote Random Forest diversity by using randomly selected subspaces, giving a weight to each subspace according to its predictive power, and using this weight in majority voting. Experimental study on 15 real datasets showed favourable results, demonstrating the potential of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Replicator Dynamics Approach to Collective Feature Engineering in Random Forests

It has been demonstrated how random subspaces can be used to create a Diversified Random Forest, which in turn can lead to better performance in terms of predictive accuracy. Motivated by the fact that each subsforest is built using a set of features that can overlap with those sets of features in other subforests, we hypothesise that using Replicator Dynamics can perform a collective feature e...

متن کامل

A Maximally Diversified Multiple Decision Tree Algorithm for Microarray Data Classification

We investigate the idea of using diversified multiple trees for Microarray data classification. We propose an algorithm of Maximally Diversified Multiple Trees (MDMT), which makes use of a set of unique trees in the decision committee. We compare MDMT with some well-known ensemble methods, namely AdaBoost, Bagging, and Random Forests. We also compare MDMT with a diversified decision tree algori...

متن کامل

Combining Bagging and Random Subspaces to Create Better Ensembles

Random forests are one of the best performing methods for constructing ensembles. They derive their strength from two aspects: using random subsamples of the training data (as in bagging) and randomizing the algorithm for learning base-level classifiers (decision trees). The base-level algorithm randomly selects a subset of the features at each step of tree construction and chooses the best amo...

متن کامل

Toward Multi-Diversified Ensemble Clustering of High-Dimensional Data

The emergence of high-dimensional data in various areas has brought new challenges to the ensemble clustering research. To deal with the curse of dimensionality, considerable efforts in ensemble clustering have been made by incorporating various subspace-based techniques. Besides the emphasis on subspaces, rather limited attention has been paid to the potential diversity in similarity/dissimila...

متن کامل

Hybrid weighted random forests for classifying very high-dimensional data

Random forests are a popular classification method based on an ensemble of a single type of decision trees from subspaces of data. In the literature, there are many different types of decision tree algorithms, including C4.5, CART, and CHAID. Each type of decision tree algorithm may capture different information and structure. This paper proposes a hybrid weighted random forest algorithm, simul...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014